Classifying Korean comparative sentences for comparison analysis
نویسندگان
چکیده
Comparisons sort objects based on their superiority or inferiority and they may have major effects on a variety of evaluation processes. The Web facilitates qualitative and quantitative comparisons via online debates, discussion forums, product comparison sites, etc., and comparison analysis is becoming increasingly useful in many application areas. This study develops a method for classifying sentences in Korean text documents into several different comparative types to facilitate their analysis. We divide our study into two tasks: 1) extracting comparative sentences from text documents, and 2) classifying comparative sentences into seven types. In the first task, we investigate many actual comparative sentences by referring to previous studies and construct a lexicon of comparisons. Sentences that contain elements from the lexicon are regarded as comparative sentence candidates. Next, we use machine learning techniques to eliminate non-comparative sentences from the candidates. In the second task, we roughly classify the comparative sentences using keywords and use a transformation-based learning method to correct initial classification errors. Experimental results show that our method could be suitable for practical use. We obtained an F1-score of 90.23% in the first task, an accuracy of 81.67% in the second task, and an overall accuracy of 88.59% for the integrated system with both tasks. 1 Introduction In many areas, comparisons are very important during decision making. For example, politicians may change their political strategies after monitoring how their policies compare with those of their competitors. Manufacturers can also change their marketing strategies after comparing their products with those of their competitors. A similar situation also applies to customers. If a customer is deciding whether to buy Car-A or Car-B, he/she will probably access the Web and type these two items into the search box. A search engine such as Google will then find relevant documents. Next, the customer will open and read each retrieved document until he/she obtains enough information. The customer " s decision may be dominated by sentences that compare these two items. It is clear that obtaining information from the Web is a good and easy solution. However, it is also clear that reading many documents until sufficient information has been acquired is still a time-consuming task. If the customer only has access to a small amount of data, he/she may form biased views. By contrast, reading large amounts of data demands an enormous amount of time and effort. Therefore, it would be very useful in many …
منابع مشابه
Extracting Comparative Entities and Predicates from Texts Using Comparative Type Classification
The automatic extraction of comparative information is an important text mining problem and an area of increasing interest. In this paper, we study how to build a Korean comparison mining system. Our work is composed of two consecutive tasks: 1) classifying comparative sentences into different types and 2) mining comparative entities and predicates. We perform various experiments to find releva...
متن کاملParsing Korean Comparative Constructions in a Typed-Feature Structure Grammar
Jong-Bok Kim, Jaehyung Yang, and Sanghoun Song. 2010. Parsing Korean Comparative Constructions in a Typed-Feature Structure Grammar. Language and Information 14.1 , 1–24. The complexity of comparative constructions in each language has given challenges to both theoretical and computational analyses. This paper first identifies types of comparative constructions in Korean and discusses their mai...
متن کاملThe Improvement of Negative Sentences Translation in English-to-Korean Machine Translation
This paper describes the algorithm for translating English negative sentences into Korean in English-Korean Machine Translation (EKMT). The proposed algorithm is based on the comparative study of English and Korean negative sentences. The earlier translation software cannot translate English negative sentences into accurate Korean equivalents. We established a new algorithm for the negative sen...
متن کاملExtracting Comparative Sentences from Korean Text Documents Using Comparative Lexical Patterns and Machine Learning Techniques
This paper proposes how to automatically identify Korean comparative sentences from text documents. This paper first investigates many comparative sentences referring to previous studies and then defines a set of comparative keywords from them. A sentence which contains one or more elements of the keyword set is called a comparative-sentence candidate. Finally, we use machine learning technique...
متن کاملFinding relevant features for Korean comparative sentence extraction
In this paper, we study how to extract comparative sentences from Korean text documents. We decompose our task into three steps: 1) collecting comparative keywords; 2) extracting comparative-sentence candidates by keyword searching; 3) eliminating non-comparative sentences from these candidates using machine learning techniques. We perform various experiments to find relevant features. As a res...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Natural Language Engineering
دوره 20 شماره
صفحات -
تاریخ انتشار 2014